Population-sample regression in the estimation of population proportions

ABSTRACT

A method is presented for estimating population from sample proportions that produces margins of error narrower for any specific sample size or that requires a sample size smaller for any specific margin of error than do previously existing methods applied to the same data. 
     This method applies an unbiased estimator (developed in this invention) of the squared correlation between population and sample proportions to determine point and interval estimates of population proportions in a regression context involving simple random sampling with replacement. 
     In virtually all reasonable applications, assuming a Dirichlet prior distribution, the margin of error produced by this method for a population proportion is shown to be 1.96 times the posterior standard deviation of the proportion.

CROSS-REFERENCE TO RELATED DOCUMENTS

The present patent application claims the benefit of U.S. Provisional Application Ser. 60/823,625 filed on Aug. 25, 2006. The prior application is incorporated herein in its entirety by reference.

TECHNICAL FIELD

This patent application relates to the technical field of poll taking, particularly utilizing statistics which encompass the estimation of population proportions from the regression of population on sample proportions. Using an unbiased estimator of the square of the correlation between the population and sample proportions in a Bayesian context produces not only point estimates of the population proportions but also credibility intervals that are narrower than conventional confidence intervals.

BACKGROUND ART

In 1961, James and Stein derived estimators of population means that are more efficient than corresponding traditional estimators by using a linear combination of the mean of an individual sample and the overall mean of the sample aggregated with two or more other samples from possibly different populations. Being within the 0-1 interval, the weight applied to each individual sample mean is called a shrinkage coefficient.

Commenting on the empirical Bayesian treatment of James-Stein estimators by Efron and Morris (1973), Stigler (1983, 1990) showed that the shrinkage coefficient was an estimator of the squared correlation coefficient in the regression of population on sample means. Fienberg and Holland (1973) extended the empirical Bayesian treatment of James-Stein estimators to single-sample population proportions, with the expected increase in efficiency. The invention herein develops an estimator of population proportions even more efficient than the Fienberg-Holland estimator, particularly in a small sample (n<500), and demonstrates that all shrinkage coefficients are estimators of squared correlation coefficients in population-sample regression. The margins of error for the population proportions, provided they follow a Dirichlet distribution, are shown to be 1.96 times their standard deviations in virtually all realistic applications.

DISCLOSURE OF INVENTION

This invention consists of a method for determining the results of a poll having a smaller than conventional margin of error for any sample size or a smaller than conventional sample size for any margin of error, the method comprising statistically determining the regression of population on sample proportions using an unbiased estimator of the square of the correlation between them. Applying this method to a single sample obtained randomly with replacement from a single population results in credibility intervals that are narrower than conventional confidence intervals, shown to be a product of the regression of sample on population proportions. Not only does the squared-correlation estimator function as a shrinkage coefficient in a Bayesian context, but also this correspondence is shown to apply generally to the estimation of population means as well as proportions. In a preferred embodiment, population-sample regression is used to develop corresponding frequentist and Bayesian estimators.

BRIEF DESCRIPTION OF DRAWINGS

Providing a more nearly complete understanding of the present invention, reference is made to the accompanying drawings, numbered below. Commonly used reference numbers identify the same or equivalent parts of the claimed invention throughout the several figures.

FIG. 1 illustrates the regression of the sample proportion P on the population proportion π.

FIG. 2 illustrates the regression of π on P.

FIG. 3 illustrates Bayesian (π-on-P) as a percent of frequentist error margins in the two-option case.

FIG. 4 illustrates Bayesian (π-on-P) error margins in the two-option case.

FIG. 5 shows the steps to follow in π-on-P point and interval estimation.

FIG. 6, also referred to as Table 1 shows credibility-interval coverage for ±1.96 standard errors in most applications of π-on-P estimation.

FIG. 7, also referred to as Table 2 compares π-on-P with other estimates in a particular example involving three options.

FIG. 8, compares critical with Bayesian and conventional margins of error.

FIG. 9, compares sample sizes required to produced commonly used error margins.

BEST MODE FOR CARRYING OUT THE INVENTION

Simple regression analysis in statistics is a procedure for estimating the linear relationship between a dependent variable and an independent variable in a given population. The relationship for standardized variables is expressed as an equation for a straight line in which the coefficient of the independent, or regressed-on, variable in the equation is determined from a sample. While the dependent variable may vary, the regressed-on variable is fixed. This variable is the sample statistic, mean or proportion, in population-on-sample regression. That is why population-on-sample regression corresponds to Bayesian estimation. The opposite is true of sample-on-population regression, which corresponds to frequentist estimation in which the population parameter, mean or proportion, is fixed.

A point-estimation advantage of the population-on-sample regression procedure is that it generally avoids the problem of relative-frequency estimates equal to zero or one by reasonably adjusting them away from these extreme values. The inward adjustment is the result of regression toward the mean, which for relative frequencies is greater than zero and less than one.

Population-on-sample regression shares the efficiency advantage of Bayesian over traditional estimation, which shows itself in a reduction of least-squares risk functions with a corresponding shortening of confidence intervals. In mental test theory, for example, standard errors of estimate, used to determine confidence intervals for true scores, are shorter than standard errors of measurement, used to determine confidence intervals for observed scores. The “shrinkage coefficient” in this case is the squared true-observed-score correlation, and the standard error of estimate is equal to the standard en-or of measurement times this correlation.

The method comprising this invention provides for both point and interval estimation for arriving at usable results of a poll. The approach to each follows both frequentist and Bayesian tracks within a regression framework. Development of either the frequentist or the Bayesian point estimator makes no distributional assumption about the population proportions. Distributional assumptions come into play only in the treatment of interval estimation. Normally a regression approach would require three or more observations to accommodate the need to estimate the slope and the intercept of the regression line from data. However, because the mean requires no estimation in the case of single-sample proportions, being the reciprocal of the number of options or categories, the Bayesian as well as the frequentist point estimator developed herein can apply to data involving only two (binomial) or more (multinomial) observations.

The focus here physically has been on a single sample obtained from a single population. Conceptually, the sample may be one of many that the population can produce or the population may be one of many that can produce the sample. The first possibility underlies the frequentist approach and the second the Bayesian approach to statistical inference. The method constituting this invention has adopted the Bayesian approach showing that it can lead through regression to considerably more efficient estimation of population proportions than the frequentist approach, especially for samples no larger than 500.

Although reference is made explicitly to proportions, the method applies equally to other forms of expressing such results of a poll, for example percentages, fractions, and decimal fractions, with appropriate adjustments known by those skilled in the art.

The following steps develop the regression point and interval estimators in the frequentist (P-on-π) case (Step 1) and in the Bayesian (π-on-P ) case (remaining steps),

Step One—The Regression of P on π

The frequentist approach to point estimation via regression corresponds to the traditional estimation procedure in which, for a sample of size n, nP (an integer) has a binomial or a multinomial distribution with E_(t)(P_(kt))=π_(k) for each option k of a total of K options, t indexing the sample. The regression expressing this or the Bayesian approach involves, for a single sample t, the means μ_(p) and μ_(π) over options

$\left( {\mu_{P} = {\left( {1/K} \right){\sum\limits_{k = 1}^{K}P_{kt}}}} \right),$

the standard deviations S_(p) and σ_(π) over options

$\left( {S_{P}^{2} = {\left( {1/K} \right){\sum\limits_{k = 1}^{K}\left( {P_{kt} - µ_{P}} \right)^{2}}}} \right),$

and the correlation coefficient ρ_(πP) of P_(kt) and π_(k) over options. To assure that E_(t)(P_(kt))=π_(k), the slope coefficient in the regression

$\begin{matrix} {P_{kt} = {{{\left( \frac{S_{P}}{\sigma_{\pi}} \right)\rho_{\pi \; P}\pi_{k}} - {\left( \frac{S_{P}}{\sigma_{\pi}} \right)\rho_{\pi}µ_{\pi}} + µ_{P} +} \in_{kt}}} & (1) \end{matrix}$

must be equal to one, the population and sample means (μ_(π) and μ_(P)) being equal (to 1/K), so that P_(kt)=π_(k)+ε_(kt), where ε_(kt) denotes sampling error. Since E_(t)(ε_(kt))=0, E_(t)(P_(kt))=π_(k), as in the traditional binomial or multinomial estimation procedure. The regression implication of E_(t)(P_(kt))=π_(k) then is that the correlation ρ_(πP) between π_(k) and P_(kt) must be equal to the ratio of their standard deviations, σ_(π) and S_(P):

$\begin{matrix} {\rho_{\pi \; P} = {\frac{\sigma_{\pi}}{S_{P}}.}} & (2) \end{matrix}$

This result resembles a basic result of classical mental test theory (Gulliksen, 1950) in which π represents a true and P an observed score. The next section will use this result to obtain an estimator of ρ_(πP) ².

Step Two—An Estimator of ρ_(πP) ²

In this as in the previous section, for each option k, π_(k) as the regressed-on variable is assumed to be fixed while P_(kt) can vary over samples t=1,2,3, . . . . Ordinarily the exact value of ρ_(πP) ² is unknown. Because P_(kt) is a proportion, however, σ_(π) ² is expressible in terms of σ_(P) ², the expected value of S_(P) ², for substitution into ρ_(πP) ²/σ_(π) ² to yield an estimator of ρ_(πP) ² in which σ_(P) ² is replaced by S_(P) ²:

$\begin{matrix} {{S_{P}^{2} = {\frac{1}{K}{\sum\limits_{k = 1}^{K}\left( {P_{kt} - µ_{P}} \right)^{2}}}},} & (3) \end{matrix}$

where μ_(P) is the mean of the K values of P_(kt) in sample t. Without further assumptions or conditions, the following derivation leads to the sample estimator of ρ_(πP) ² in Equation 13.

In Equation 3, μ_(P), equal to 1/K, is the population as well as the sample mean proportion so that S_(P) ^(2,) with K rather than (K−1) in the denominator, is an unbiased estimator of σ_(P) ²:

$\begin{matrix} {\sigma_{P}^{2} = {{E_{t}\left( {\left( \frac{1}{K} \right){\sum\limits_{k = 1}^{K}\left( {P_{kt} - u_{P}} \right)^{2}}} \right)}.}} & (4) \end{matrix}$

If π_(k)=μ_(π)+δ_(k), where

${{\left( {1/K} \right){\sum\limits_{k = 1}^{K}\delta_{k}}} = 0},{{{then}\mspace{14mu} \sigma_{\pi}^{2}} = {\left( {1/K} \right){\sum\limits_{k = 1}^{K}{\delta_{k}^{2}.}}}}$

As noted in the preceding section, P_(kt)=π_(k+ε) _(kt). The expected values of ε_(kt) and δ_(k)ε_(kt) (equal to δ_(k) times the expected value of ε_(kt)) are equal to zero. Substitution first of π_(k)+ε_(kt) for P_(kt) and then of μ_(π)+δ_(k) for π_(k) in Equation 4 thus, with μ_(π=μ) _(P), leads to

$\begin{matrix} {{\sigma_{P}^{2} = {\sigma_{\pi}^{2} + {\frac{1}{K}{\sum\limits_{k = 1}^{K}\sigma_{\in k}^{2}}}}},} & (5) \end{matrix}$

where, for each option k, σ_(ek) ² is the sampling variance

$\begin{matrix} {\sigma_{\in k}^{2} = {\left( \frac{1}{n} \right){{\pi_{k}\left( {1 - \pi_{k}} \right)}.}}} & (6) \end{matrix}$

Substitution of μ_(π)=δ_(k) for π_(k) in the computation of the mean of Equation 6 over the K values of σ_(εk) ² produces

$\begin{matrix} {{{\left( \frac{1}{K} \right){\sum\limits_{k = 1}^{K}\sigma_{\in k}^{2}}} = {\frac{µ_{\pi}\left( {1 - µ_{\pi}} \right)}{n} - \frac{\sigma_{\pi}^{2}}{n}}}{since}{{\sum\limits_{k = 1}^{K}{µ_{\pi}\delta_{k}}} = {{µ_{\pi}{\sum\limits_{k = 1}^{K}\delta_{k}}} = {{0\mspace{14mu} {and}\mspace{14mu} {\sum\limits_{k = 1}^{K}\delta_{k}^{2}}} = {K\; {\sigma_{\pi}^{2}.}}}}}} & (7) \end{matrix}$

Equation 5 thus becomes

$\begin{matrix} {\sigma_{P}^{2} = {\sigma_{\pi}^{2} + \frac{µ_{\pi}\left( {1 - µ_{\pi}} \right)}{n} - \frac{\sigma_{\pi}^{2}}{n}}} & (8) \end{matrix}$

or, with 1/K for μ_(π),

$\begin{matrix} {\sigma_{P}^{2} = {\sigma_{\pi}^{2} + \frac{\left( {K - 1} \right)}{K^{2}n} - {\frac{\sigma_{\pi}^{2}}{n}.}}} & (9) \end{matrix}$

Solution of Equation 9 for a σ_(π) ² finally yields the expression of σ_(π) ² in terms of σ_(P) ²:

$\begin{matrix} {\sigma_{\pi}^{2} = {{\left( \frac{n}{n - 1} \right)\sigma_{P}^{2}} - {\left( \frac{K - 1}{n - 1} \right){\left( \frac{1}{K} \right)^{2}.}}}} & (10) \end{matrix}$

The formula for ρ_(πP) ² is thus

$\begin{matrix} {\rho_{\pi \; P}^{2} = \frac{{\left( \frac{n}{n - 1} \right)\sigma_{P}^{2}} - {\left( \frac{K - 1}{n - 1} \right)\left( \frac{1}{K} \right)^{2}}}{S_{P}^{2}}} & (11) \end{matrix}$

so that the estimator of ρ_(πP) ² is

$\begin{matrix} {{\hat{\rho}}_{\pi \; P}^{2} = \frac{{\left( \frac{n}{n - 1} \right)S_{P}^{2}} - {\left( \frac{K - 1}{n - 1} \right)\left( \frac{1}{K} \right)^{2}}}{S_{P}^{2}}} & (12) \\ {{{or},{since}}{S_{P}^{2} = {{\left( {1/K} \right){\sum\limits_{k = 1}^{K}P_{kt}^{2}}} - \left( {1/K} \right)^{2}}},{{\hat{\rho}}_{\pi \; P}^{2} = {1 - \frac{K\left( {1 - {\sum\limits_{k = 1}^{K}P_{kt}^{2}}} \right)}{\left( {n - 1} \right)\left( {{K{\sum\limits_{k = 1}^{K}P_{kt}^{2}}} - 1} \right)}}},{{where}\mspace{14mu} {\sum\limits_{k = 1}^{K}P_{kt}^{2}}}} & (13) \end{matrix}$

is the sum of the squares of the K proportions computed from the sample t of size n. Equation 12 shows that {circumflex over (ρ)}_(πP) ² is not only an increasing function of S_(P) ², K, and n but also an unbiased estimator of ρ_(πP) ², the S_(P) ² in the denominator being fixed. Step Three—Point Estimation: Estimation of π from P

The regressions underlying the developments in this and the preceding section are opposite in direction. Both involve P and π. The development in the preceding section considered π as fixed and P as variable. In the development here, however, the reverse is true: P is fixed, and π is variable. In this development, the fixed-P_(k) sample (k=1,2, . . . ,K) comes from a single population, which is one of any number of possible populations, with their correspondingly different π_(k) values. Although the expected value of a variable P_(k) is π_(k), the expected value of a variable π_(k) is not P_(k), but a value {circumflex over (π)}_(k) somewhere between P_(k) and 1/K. Whereas the direction of regression assumed in the preceding section worked for the development of a formula for {circumflex over (ρ)}_(πP) ², the regression direction taken here is particularly appropriate for the estimation of an unknown π_(k), assumed variable, from a known P_(k), assumed fixed for each option k:

$\begin{matrix} {{{\hat{\pi}}_{k} = {{{\rho_{\pi \; P}\left( \frac{\sigma_{\pi}}{S_{P}} \right)}P_{k}} - {{\rho_{\pi \; P}\left( \frac{\sigma_{\pi}}{S_{P}} \right)}µ_{P}} + µ_{\pi}}},} & (14) \end{matrix}$

where {circumflex over (π)}_(k) is the regression estimate of π_(k), ρ_(πP) is the correlation between P_(k) and π_(k) for the population sampled from, and μ_(P) and μ_(π) are the means and S_(P) and σ_(k) are the standard deviations over options of P_(k) and π_(k), respectively. Since ρ_(πP) ²=σ_(π) ²/S_(P) ², this equation simplifies to

{circumflex over (π)}_(k)=ρ_(πP) ²P_(k)−ρ_(πP) ²μ_(P)+μ_(π),   (15)

where μ_(π)is equal to μ_(P) so that, with both μ_(π)and μ_(P) denoted by μ,

{circumflex over (π)}_(k)=ρ_(πP) ²+(1−ρ_(πP) ²)μ  (16)

or, since μ=1/K,

$\begin{matrix} {{\hat{\pi}}_{k} = {{\rho_{\pi \; P}^{2}P_{k}} + {\frac{\left( {1 - \rho_{\pi \; P}^{2}} \right)}{K}.}}} & (17) \end{matrix}$

Estimation of the population proportion corresponding to the observed proportion P_(k) thus requires knowledge only of ρ_(πP) ². If ρ_(πP) ²=1, {circumflex over (π)}_(k)=P_(k); if ρ_(πP) ²=0, {circumflex over (π)}_(k)=1/K. Generally, in practice, {circumflex over (π)}_(k) will be somewhere between P_(k) and 1/K.

Since P_(k) is assumed fixed, substitution of {circumflex over (ρ)}_(πP) ² for ρ_(πP) ² in Equation 17 yields an estimate of {circumflex over (π)}_(k) that is not subject to sampling variation.

FIG. 1 illustrates the regression of P on π and FIG. 2 the regression of π on P. In both figures, the population proportions are fictitious since their actual values are unknown. Knowledge of these values is unnecessary because the only requirements for estimation are the sample proportions in FIG. 1 and the regression line in FIG. 2. The vertical lines define 95% confidence intervals. Based on n=100, the value of {circumflex over (ρ)}_(πP) ² is 0.88, the slope of the regression line in FIG. 2. In addition to their different slopes, the two regression lines notably have different intercepts: 0 in FIG. 1 and 0.03 in FIG. 2. Except when {circumflex over (ρ)}_(πP) ²=1, π-on-P regression produces population-proportion estimates that are greater than zero and less than one.

Two examples provide data to illustrate the π-on-P regression procedure. The first, cited by Tull and Hawkins (1993, pp. 745-746) in the spirit of R. A. Fisher's classic tea-tasting illustration of the Student t-test, was a Carnation taste test comparing Coffee-mate to real cream. Of 285 participants who claimed to be able to distinguish between two cups of coffee presented them, one containing Coffee-mate and the other containing cream, 153 were correct and 132 were incorrect, the corresponding proportions being 0.54 and 0.46. With {circumflex over (ρ)}_(πP) ²=0.42, the corrected proportions were 0.52 and 0.48, respectively. These proportions more strongly than their uncorrected counterparts support the conclusion that people could not tell the difference between Coffee-mate and real cream.

In the second example, a large school district tested three different textbooks for first-year high school algebra. The first was the book used for the past several years; the second and third were new books containing questions taken from recent versions of a statewide mathematics examination. The question format differed in these books, being open-ended in the second book and multiple- choice (as in the statewide examination) in the third book. Two hundred students in different classes used each book. Of the 450 students who passed the statewide examination, 130 had used the first book, 158 the second book, and 162 the third book. The corresponding proportions were 0.29, 0.35, and 0.36, respectively. Substituting these proportions, which sum to 1.00, in Equation 13, together with K=3 and n=450 yields 0.48 for the value of {circumflex over (ρ)}_(πP) ², and using this value for ρ_(πP) ² in Equation 17 yields {circumflex over (π)}_(k) values of 0.31, 0.34, and 0.34 for the three books, respectively. The first and third values notably differ (by 0.02 each) from their uncorrected counterparts while the second, being closer to the mean of 0.33, shows a difference of only 0.01, to two decimal places. If the books had been equally effective, the expected proportions within the passing group would be equal to 0.33 for students using all three books.

Step 4A—Interval Estimation: Two-Option Case

Reported survey results often include half the size of the 95% confidence interval as the so-called “margin of error.” For K=2, the procedures developed in this invention involve intervals different from the conventional ones. The confidence or credibility intervals appropriate for the K=2 procedures developed here are functions of the standard error of measurement, applicable to the regression of P on π, or the standard error of estimate, applicable to the regression of π on P (Kelley, 1923, 1927). Both standard errors involve the assumption of homoscedasticity: Values of the standard errors of measurement are equal for all values of π_(k), and values of the standard error of estimate are equal for all values of P_(k) (k=1,2,3 . . . ,K). In the case of proportions, as opposed to means of multi-valued variables, this assumption makes sense only when K=2.

Though of less practical value, the standard error of measurement, σ_((P−π))=S_(P)√{square root over (1−ρ_(πP) ², )} produces confidence intervals directly comparable to the conventional ones. Estimates of σ_((P−π)) are obtainable by using {circumflex over (ρ)}_(πP) ² for ρ_(πP) ² in the formula for σ_((P−π)).

The Carnation data provide an example. The two uncorrected proportions, 0.54 and 0.46, were inaccurate by an amount equal to ±0.06. As the conventional 95% margin of error, this value (0.06) is 1.96 times the square root of 0.5(1−0.5)/285. Use of the standard error of measurement would produce a 95% confidence interval of the same size, to two decimal places. Substituting the value of 0.039 for S_(P) and the {circumflex over (ρ)}_(πP) ² value of 0.42 for ρ_(πP) ² yields σ_((P−π))=0.030, or (multiplying 0.030 by 1.96) a 95% error margin of ±0.06.

Confidence intervals determined from the standard error of measurement are directly comparable to conventionally determined confidence intervals because both are based on the assumption of a fixed π and a variable P. The standard error of estimate, applicable in the regression case of a fixed P and a variable π, has the same formula as the standard error of measurement with the exception that σ_(π)replaces S_(P): σ_(({circumflex over (π)}−π))=σ_(π)√{square root over (1−ρ_(πP) ²)}. Since ρ_(πP) ²=σ_(π) ²/S_(P) ², the standard error of estimate will, except when ρ_(πP) ²=1, be smaller than the standard error of measurement by a factor of ρ_(πP). Conceptually, the standard error of estimate should be smaller than the standard error of measurement because the difference (P−π) contains a varying component representing bias that is absent in the difference ({circumflex over (π)}−π). Credibility intervals for a variable π will therefore generally be smaller than corresponding confidence intervals for a variable P. When K=2, the estimate of σ_(({circumflex over (π)}−π)) corresponding to σ_((P−π)) is equal to the estimate of σ_((P−π)) multiplied by {circumflex over (ρ)}_(πP).

In the Carnation example, with {circumflex over (ρ)}_(πP) ²=0.42, the standard error of estimate is √{square root over (0.42 )} times 0.030 (the standard error of measurement), or 0.019, so that the 95% margin of error (1.96 times 0.019) is ±0.04. This (rounded from 0.037) is considerably smaller than the conventional error margin of ±0.06. The sample of 285 would, in fact, have to be 417 larger (a total of 702 respondents) to achieve the same ±0.04 margin of error conventionally. Since both the confidence and the credibility intervals overlap the chance proportion of 0.50, the data do not support the claim that tasters can tell cream from Coffee-mate.

For the ±0.04 credibility interval to be comparable to the conventional ±0.06 confidence interval, it must also contain 95% of the area Linder its frequency curve. The next section will investigate the extent to which this is the case not only here but also more generally.

Question—Do ±1.96 Standard Errors Constitute 95% Credibility Intervals in π-on-P Estimation?

The answer, generally, is yes, as this section demonstrates.

Corresponding to the assumption of a binomial distribution for sample proportions is the assumption of a beta distribution for population proportions. One distribution is the conjugate of the other. Dirichlet distributions are correspondingly conjugates of multinomial distributions. Such assumptions of conjugate distributions are common in Bayesian analyses (e.g., Good, 1965, Chapter 3). Under the beta-distribution assumption, not only does the ±0.04 credibility interval of the Carnation example contain 95% of the area under its frequency curve but also, as Table 1 shows {circumflex over (π)}±1.96 σ_(({circumflex over (π)}−π))credibility intervals ranging from ±0.01 to ±0.10 of point estimates ({circumflex over (π)}) between 0.05 and 0.50 will generally contain 95% of possible π values. (No subscript for π is necessary here because a beta distribution involves only two proportions, π and 1−π.) The coverages shown in Table 1 are based on calculations, not Monte Carlo sampling. Table 1 shows confidence-interval proportions for values of π not only between 0.05 and 0.50 but also, though indirectly, between 0.50 and 0.95. Because beta distributions having mean values between 0.50 and 0.95 are mirror images of beta distributions having mean values between 0.05 and 0.50, the credibility-interval proportion for {circumflex over (π)} is equal to the credibility-interval proportion for 1−{circumflex over (π)}, provided that both distributions have equal standard deviations (σ_(({circumflex over (π)}))).

Table 1 shows credibility-interval proportions as a function of beta-distribution means and standard deviations because these are the parameters involved in the determination of credibility intervals. Beta distributions, however, are functions of two parameters, a and b, related to beta-distribution means ({circumflex over (π)}) and standard deviations (σ_(({circumflex over (π)}))), as follows:

$\begin{matrix} {a = {\hat{\pi}\left( {a + b} \right)}} & (18) \\ {and} & \; \\ {{b = {\left( {1 - \hat{\pi}} \right)\left( {a + b} \right)}},{where}} & (19) \\ {{a + b} = \frac{\hat{\pi}\left( {1 - \hat{\pi}} \right)}{\sigma_{({\hat{\pi} - \pi})}^{2}}} & (20) \end{matrix}$

For the Carnation data, α+b=(0.52)(1−0.52)/(0.019)²−1, or 690, so that α=0.52(690), or 359, and b=(1−0.52)690, or 331, and for these values of a and b the interval between 0.52−1.96(0.019) and 0.52+1.96(0.019) contains 95% of the area under the beta-distribution frequency curve. Table 1 shows this result in the row for 0.50 (close to 1−0.52) and the column for 0.020 (close to 0.019).

The two standard errors, the standard error of measurement and the standard error of estimate, differ not only in the lengths of the confidence or credibility intervals that they produce but also in one other important respect: While the standard error of measurement, computed from S_(P) and {circumflex over (ρ)}_(πP) ² (itself a function of S_(P)), is subject to sampling variation due to the possible variation of P for each option over samples, the standard error of estimate does not change because under the π-and-P assumption governing its use each P remains constant while only π can change for each option. In both these respects, the standard error of estimate is superior to the standard error of measurement for use in the determination of credibility or confidence intervals for population proportions.

The results shown in Table 1 are applicable to Dirichlet as well as beta distributions because a beta distribution describes each Dirichlet proportion if all the other proportions are aggregated as its complement.

Step 4B—Interval Estimation: Case of Two or More Options

The standard error of estimate provides the basis for a common credibility interval, applicable particularly for proportions when K=2. The assumption of a Dirichlet prior distribution for the population proportions makes possible the determination of a different interval for each proportion when K>3. (When K=2, the two intervals are the same.) If τ designates the total of the parameters of a Dirichlet distribution, then the posterior variance of π_(k) for option k is

$\begin{matrix} {{{{Var}\left( \pi_{k} \middle| P_{k} \right)} = \frac{{\hat{\pi}}_{k}\left( {1 - {\hat{\pi}}_{k}} \right)}{\tau + n + 1}},} & (21) \end{matrix}$

where P_(k) is the observed proportion for option k. Use of this equation, with {circumflex over (ρ)}_(k) from Equation 17 estimating the expected value of π_(k), requires knowledge of τ.

Since E(π_(k)|P_(k))=(τπ_(k+nP) _(k))/(τ+n) for a Dirichlet distribution, the Dirichlet shrinkage coefficient corresponding to {circumflex over (ρ)}_(πP) ² here is n/(τ+n), and so for the π-on-P procedure

$\begin{matrix} {\tau_{\pi|P} = \frac{n\left( {1 - {\hat{\rho}}_{\pi \; P}^{2}} \right)}{{\hat{\rho}}_{\pi \; P}^{2}}} & (22) \end{matrix}$

or, from Equation 13,

$\begin{matrix} {\tau_{\pi|P} = {\frac{K\; \left( {1 - {\sum\limits_{k = 1}^{K}P_{k}^{2}}} \right)}{{K{\sum\limits_{k = 1}^{K}P_{k}^{2}}} - 1 - {\left( {K - 1} \right)/n}}.}} & (23) \end{matrix}$

According to Fienberg and Holland (1973), the minimax value of τ is √{square root over (n)} and the maximum-likelihood estimator of τ is

$\begin{matrix} {\tau_{ML} = {\frac{K\left( {1 - {\sum\limits_{k = 1}^{K}P_{k}^{2}}} \right)}{{K{\sum\limits_{k = 1}^{K}P_{k}^{2}}} - 1}.}} & (24) \end{matrix}$

Using the values of τ computed from Equations 23 and 24 as well as √{square root over (n)} in Equation 21 with data from the three-textbook example presented earlier produced the 95% credibililty or confidence intervals shown in Table 2, which includes corresponding traditional, frequentist results.

Table 2 shows separate error margins for the three different textbook groups, as well as their different population-proportion estimates under the four different procedures, corresponding in the case of the non-frequentist procedures to their different values of τ: 21, 232, and 487, respectively, for minimax, Fienberg and Holland, and π-on-P. The π-on-P procedure produced the narrowest margins of error. This result is not surprising since, as Equations 23 and 24 make clear, τ_(π|P)>τ_(ML), the difference diminishing as ii gets large. According to Equation 22, the τ for the minimax procedure (√{square root over (n)}) will also be smaller than τ_(π|P) unless {circumflex over (ρ)}_(πP) ²/(1−{circumflex over (ρ)}_(πP) ²)>√{square root over (n)}, which is not the case here. Since the standard errors of the three non-frequentist procedures ranged between 0.01 and 0.02, the credibility intervals, as Table 1 shows, all have approximately 95% coverage.

Depending on which of the four estimation procedures they used, investigators looking at the study's results might come to entirely different conclusions. Since 0.33 bordered or lay outside the confidence or credibility interval for the old textbook in all but the π-on-P procedure, investigators using the three other procedures might conclude that the old textbook was less effective than the two new ones. An investigator using the π-on-P procedure, however, would not reach that conclusion. All three credibility intervals produced by that procedure, despite being generally narrower than the others by 0.02, contain the chance proportion 0.33. The conclusion following from this result is that further study is necessary before selecting a textbook for general use.

Comparison of Bayesian and Frequentist Procedures in the Two-option Case

FIGS. 3 and 4 provide a broader view of the frequentist and Bayesian (π-on-P) regression procedures, Limited to K=2, these figures show, for a range of sample sizes and P values, Bayesian margins of error as percents of frequentist margins of error (FIG. 3) and actual Bayesian margins of error (FIG. 4). The three high curves in each figure represent high {circumflex over (ρ)}_(πP) ² values, approaching one for n=500, while the bottom curve (for P=0.54) represents a comparably low {circumflex over (ρ)}_(πP) ² value. When {circumflex over (ρ)}_(πP) ² is high, Bayesian and frequentist margins of error are very nearly equal, as are corresponding point estimates; when {circumflex over (ρ)}_(πP) ² is low, Bayesian margins of error are low and point estimates are close to the mean, relative to their frequentist counterparts. For P values that are very close to the mean, {circumflex over (ρ)}_(πP) ² can be so low that Bayesian point estimates are for all practical purposes equal to the mean, with margins of error effectively equal to zero. This is the case for P=0.52 when n≦500.

Is π-on-P an Empirical or a Purely Bayesian Procedure?

In regressing the observed proportion P toward the mean, 1/K, the squared correlation {circumflex over (ρ)}_(πP) ² resembles the shrinkage coefficient w in Fienberg and Holland (1973) or 1−B in Efron and Morris (1973) and Morris (1983). Because the developments using w or 1−B involve empirical Bayesian estimation, this resemblance suggests that π-on-P regression may also be empirical Bayesian. That is not the case, however.

The π-on-P procedure is a regression, not an empirical Bayesian, procedure. The difference is important. While estimates in both the π-on-P and the Fienberg and Holland (1973) procedures are expected values of π given P, both π and P may vary in empirical Bayesian estimation while only π may vary in estimation by π-on-P regression. If P as well as π were to vary in π-on-P regression, then the credibility intervals computed from the standard error of estimate would be too small, as the coverage proportions in Table 1 would be too large. The more apt comparison is with pure Bayesian estimation because in both this and π-on-P regression P is fixed while only π may vary. Because the regressed-on variables are fixed in regression estimation, the coverage proportions in Table 1 and their corresponding confidence intervals are accurate.

Shrinkage Coefficients as Slope Coefficients in Regression

Shrinkage coefficients maybe interpretable as slope coefficients in regression. Stigler (1990) made this observation in relation to the work originated by James and Stein (1961), involving means. The squared correlation ρ_(πP) ² is in fact the single-sample binary (0-1) data counterpart to the the multi-sample shrinkage coefficient 1−B cited by Morris (1983) under the empirical Bayes assumption of normal distributions for both sample and population means. If for m samples, with m>2, σ_(( X−μ)) ² and σ_(μ) ² are the respective variances of these distributions, then, according to Morris,

1−B=σ_(μ) ²/σ_(μ) ²+σ_(( X−μ)) ²), which is the square of the correlation between μ and X. In view of Equations 23 and 24, the shrinkage coefficient w in the Fienberg and Holland (1973) procedure is a consistent estimator of the squared correlation between π and P, the shrinkage coefficient developed here ({circumflex over (ρ)}_(πP) ²) being an unbiased estimator of it.

In the James-Stein case, σ_(( X−μ)) ²=1 so that the risk function of the traditional estimator X is equal to one for each value of I. If k has a normal distribution, then the posterior variance of μ given X is equal to 1−B and, as Efron and Morris (1973) observed, the risk function of the estimator (1−B) X, which is the posterior mean of μ given X assuming μ to have a mean of zero, is smaller than the risk function of X by an amount equal to B. The estimator (1−B) X is the James-Stein estimator if B is replaced by

${\left( {m - 2} \right)/{\sum\limits_{i = 1}^{m}{\overset{\_}{X}}_{i}^{2}}},$

whose expected value is equal to B because, with σ_(( X−μ)) ²=1 and 1/B the variance of

$\overset{\_}{X},{B{\sum\limits_{i = 1}^{m}{\overset{\_}{X}}_{i}^{2}}}$

has a χ² _(m) distribution with negative first moment equal to 1/(m−2). The James-Stein shrinkage coefficient

$1 - {\left( {m - 2} \right)/{\sum\limits_{i = 1}^{m}{\overset{\_}{X}}_{i}^{2}}}$

is thus interpretable as an unbiased estimator of the square of the correlation between μ and X, or as an unbiased estimator of the slope coefficient in the regression of μ on X, the intercept being equal to zero.

Since {circumflex over (ρ)}_(πP) ² is also an unbiased estimator, it corresponds in single-sample proportions estimation to the James-Stein shrinkage coefficient in multi-sample means estimation.

Bayesian Versus Conventional Margins of Error in the Estimation of Population Proportions

The use of Bayesian (π-on-P) estimation of a population proportion requires an amended definition of margin of error. In conventional or classical estimation, the margin of error depends only on sample size. In Bayesian estimation, the margin of error varies not only with sample size but also with the estimated population proportion obtained from the observed sample proportion. The margin of error that is meaningful in Bayesian estimation is the difference between 0.50 and the estimated population proportion. This margin of error is called the critical margin of error. When the estimated population proportion is 0.53, for example, the critical margin of error is 0.03 (0.53-0.50). If 0.53 is the Bayesian estimate of the population proportion in a sample large enough to produce a margin of error equal to 0.03 for an estimated population proportion of 0.53, then the conclusion from this result is that the population proportion is equal to 0.53 plus or minus 0.03 or, in other words, that the population proportion is marginally larger than 0.50. If for the same sample size the population-proportion estimate is larger than 0.53 (or smaller than 0.47), then that estimate plus or minus its error margin will exclude 0.50, as illustrated by the accompanying graph for a sample of size 300 (FIG. 8).

For conventional and Bayesian estimation of population proportions, FIG. 9 shows a table that compares samples sizes required to produce commonly used error margins. In the case of Bayesian estimation, the error margins are critical margins of error. For every error margin, Bayesian estimation requires a smaller sample than conventional estimation.

The present invention has been particularly shown and described with respect to certain preferred embodiments and features thereof. However, it should be readily apparent to those of ordinary skill in the art that various changes and modifications in form and detail may be made without departing from the spirit and scope of the invention as set forth in the appended claims. The invention illustratively disclosed herein may be practiced without any element which is not specifically disclosed herein. 

1. A method for determining the results of a poll involving estimated proportions that have a smaller than conventional margin of error for any sample size or a smaller than conventional sample size for any margin of error, by statistically determining the regression of population on sample proportions using an unbiased estimator of the square of the correlation between them.
 2. A method of determining a point estimate of a population proportion comprising the steps of: 1) determining the estimated regression coefficient in accordance with the relation ${{\hat{\rho}}_{\pi \; P}^{2} = {1 - \frac{K\left( {1 - {\sum\limits_{k = 1}^{K}P_{kt}^{2}}} \right)}{\left( {n - 1} \right)\left( {{K{\sum\limits_{k = 1}^{K}P_{kt}^{2}}} - 1} \right)}}},$ which is an unbiased estimator of ρ_(πP) ², where P is a sample and π a population proportion, ρ_(πP)=correlation between π and P over options, {circumflex over (ρ)}_(πP) ²=estimator of square of ρ_(πP), k=each option of a total of K options, and ${\sum\limits_{k = 1}^{K}P_{kt}^{2}} = \begin{matrix} {{{sum}\mspace{14mu} {of}\mspace{14mu} {the}\mspace{14mu} {squares}\mspace{14mu} {of}\mspace{14mu} {the}\mspace{14mu} K\mspace{14mu} {proportions}}\;} \\ {{{computed}\mspace{14mu} {from}\mspace{14mu} {the}\mspace{14mu} {sample}\mspace{11mu} t\mspace{14mu} {of}\mspace{14mu} {size}\mspace{14mu} n};{and}} \end{matrix}$ 2) determining the point estimate by the regression of π on P; ${{\hat{\pi}}_{k} = {{\rho_{\pi \; P}^{2}P_{k}} + \frac{\left( {1 - \rho_{\pi \; P}^{2}} \right)}{K}}},$ where ρ_(πP) ² is estimated by {circumflex over (ρ)}_(πP) ².
 3. A method for determining the credibility interval for {circumflex over (π)}_(k) from the posterior variance of π_(k) for option k: ${{{Var}\left( \pi_{k} \middle| P_{k} \right)} = \frac{{\hat{\pi}}_{k}\left( {1 - {\hat{\pi}}_{k}} \right)}{\tau + n + 1}},$ where τ designates the total of the parameters of a Dirichlet distribution, estimated by $\tau_{\pi|P} = \frac{n\left( {1 - {\hat{\rho}}_{\pi \; P}^{2}} \right)}{{\hat{\rho}}_{\pi \; P}^{2}}$ or, substituting for {circumflex over (ρ)}_(πP) ², $\tau_{\pi|P} = {\frac{K\left( {1 - {\sum\limits_{k = 1}^{K}P_{k}^{2}}} \right)}{{K{\sum\limits_{k = 1}^{K}P_{k}^{2}}} - 1 - {\left( {K - 1} \right)/n}}.}$
 4. A method of determining a credibility interval, equal to {circumflex over (π)}_(k) plus or minus 1.96 times the square root of Var(π_(k)|P_(k)) , that in all practical cases contains 95 percent of the possible values of π_(k), while being smaller than any corresponding confidence or other credibility interval. 